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This article is a tutorial on Markov chain Monte Carlo simulations and 
their statistical analysis. The theoretical concepts are illustrated through 
many numerical assignments from the author's book [7] on the subject. 
Computer code (in Fortran) is available for all subjects covered and can 
be downloaded from the web. 



Contents 



1. Introduction 

Markov chain Monte Carlo (MC) simulations started in earnest with the 
1953 article by Nicholas Metropolis, Arianna Rosenbluth, Marshall Rosen- 
bluth, Augusta Teller and Edward Teller |T2]. Since then MC simulations 
have become an indispensable tool with applications in many branches of 
science. Some of those are reviewed in the proceedings ^3] of the 2003 
Los Alamos conference, which celebrated the 50th birthday of Metropolis 
simulations. 

The purpose of this tutorial is to provide an overview of basic concepts, 
which are prerequisites for an understanding of the more advanced lectures 
of this volume. In particular the lectures by Prof. Landau are closely related. 

The theory behind MC simulations is based on statistics and the analy- 
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sis of MC generated data is applied statistics. Therefore, statistical concepts 
are reviewed first in this tutorial. Nowadays abundance of computational 
power implies also a paradigm shift with respect to statistics: Computation- 
ally intensive, but conceptually simple, methods belong at the forefront. MC 
simulations are not only relevant for simulating models of interest, but they 
constitute also a valuable tool for approaching statistics. 

The point of departure for treating Markov chain MC simulations is the 
Metropolis algorithm for simulating the Gibbs canonical ensemble. The heat 
bath algorithm follows. To illustrate these methods our systems of choice 
are discrete Potts and continuous 0(n) models. Both classes of models are 
programmed for arbitrary dimensions (d = 1, 2, 3, 4, ... ). On the advanced 
side we introduce multicanonical simulations, which cover an entire tem- 
perature range in a single simulation, and allow for direct calculations of 
the entropy and free energy. 

In summary, we consider Statistics, Markov Chain Monte Carlo simu- 
lations, the Statistical Analysis of Markov chain data and, finally, Multi- 
canonical Sampling. This tutorial is abstracted from the author's book on 
the subject 0. Many details, which are inevitably ommitted here, can be 
found there. 

2. Probability Distributions and Sampling 

A sample space is a set of points or elements, in natural sciences called 
measurements or observations, whose occurrence depends on chance. 
Carrying out independent repetitions of the same experiment is called sam- 
pling. The outcome of each experiment provides an event called data point. 
In N such experiments we may find the event A to occur with frequency 
n, < n < N. The probability assigned to the event A is a number P(A), 
< P{A) < 1, so that 



This equation is sometimes called the frequency definition of probabil- 



Let us denote by P(a, b) the probability that x r £ [a, b] where x r is a 
continuous random variable drawn in the interval (— oo,+oo) with the 
probability density f(x). Then, 



P(A) = lim — . 



(1) 



ity. 
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Knowledge of all probabilities P(a, b) implies 

The (cumulative) distribution function of the random variable x r is 
defined as 

F(x) = P{x r < x) = ( f(x)dx. (4) 

J — oo 

A particularly important case is the uniform probability distribution 

for random numbers between [0, 1), 

^) = { lf0r °^ <lj (5) 
I elsewhere. 

Remarkably, the uniform distribution allows for the construction of general 
probability distributions. Let 

y = F(x)= f f(x')dx' 



and assume that the inverse x = F 1 (y) exists. For y r being a uniformly 
distributed random variable in the range [0, 1) it follows that 

x r = F- l (y r ) (6) 

is distributed according to the probability density f(x). 

The Gaussian or normal distribution is of major importance. Its 
probability density is 

g{x) = -L= e-* 2 /^ 2 ) (7) 

where a 2 is the variance and a > the standard deviation. The Gaus- 
sian distribution function G(x) is related to that of variance a 2 = 1 by 



OW - £ ,M ^ = £ £ er^ «■ _ i + i „ (-^) . 

(8) 

In principle we could now generate Gaussian random numbers according 
to Eq. ©. However, the numerical calculation of the inverse error function 
is slow and makes this an impractical procedure. Much faster is to express 
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the product probability density of two independent Gaussian distributions 
in polar coordinates 

_J_ e-W) e -V 2 /(^ 2 ) dx dy = _J_ e ~r*/(2° 2 ) d(j)rdr 
Z7T (J Z7T (J 

and to use the relations 

x r = r r cos r and y r = r r sin <fi r . (9) 

3. Random Numbers and Fortran Code 

According to Marsaglia and collaborators ^7] a list of desirable properties 
for (pseudo) random number generators is: 

(i) Randomness. The generator should pass stringent tests for randomness. 

(ii) Long period. 

(hi) Computational efficiency. 

(iv) Repeatability. Initial conditions (seed values) completely determine 

the resulting sequence of random variables. 

(v) Portability. Identical sequences of random variables may be produced 

on a wide variety of computers (for given seed values). 

(vi) Homogeneity. All subsets of bits of the numbers are random. 

Physicists have added a number of their applications as new tests (e.g., 
see |22| and references therein). In our program package a version of the 
random number generator of Marsaglia and collaborators \T]\ is provided. 
Our corresponding Fortran code consists of three subroutines: 

rmaset . f to set the initial state of the random number generator, 
ranmar . f which provides one random number per call, 
rmasave . f to save the final state of the generator. 

In addition, rmaf un . f is a Fortran function version of ranmar . f and 
calls to these two routines are freely interchangeable. Related is also the 
subroutine rmagau.f , which generates two Gaussian random numbers. 

The subroutine rmaset .f initializes the generator to mutually indepen- 
dent sequences of random numbers for distinct pairs of 

-1801 < iseedl < 29527 and - 9373 < iseed2 < 20708 . (10) 

This property makes the generator quite useful for parallel processing. 
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Fig. 1. The Fortran routines are provided and prepared to run in the tree structure of 
folders depicted in this figure. This tree unfolds from the downloaded file. 

3.1. How to get and run the Fortan code 

To download the Fortran code book visit the website 

http : //b_berg. home. comcast. net/ 

and follow the instructions given there. If the above link should be unavail- 
able, visit the author's homepage which is presently located at 

http://www.hep.fsu.edu/~berg . 

After installation the directory tree shown in Fig. 2] is obtained. ForLib 
contains a library of functions and subroutines which is closed in the sense 
that no reference to non-standard functions or subroutines outside the li- 
brary is ever made. Fortran programs are contained in the folder ForProg 
and procedures for interactive use in ForProc. It is recommended to leave 
the hyperstructure of program dependencies introduced between the levels 
of the STMC directory tree intact. Otherwise, complications may result 
which require advanced Fortran skills. 

Assignment: Marsaglia random numbers. Run the program mar . f 
to reproduce the following results: 



RANMAR INITIALIZED. 


MARSAGLIA 


CONTINUATION. 


idat , 


xr = 1 





116391063 


idat , 


xr = 


1 





495856345 


idat , 


xr = 2 





96484679 


idat , 


xr = 


2 





577386141 


idat , 


xr = 3 





882970393 


idat , 


xr = 


3 





942340136 


idat , 


xr = 4 





420486867 


idat , 


xr = 


4 





243162394 


extra 


xr = 





495856345 


extra 


xr = 







550126791 
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Fig. 2. Gaussian peaked distribution function and estimates of x q for the 70% 
(approximately 1 a) and 95% (approximately 2 a) confidence intervals. 



Understand how to re-start the random number generator and how to 
perform different starts when the continuation data file ranmar . d does 
not exist. You find mar.f in ForProg/Marsaglia and it includes subrou- 
tines from ForLib. To compile properly, mar.f has to be located two lev- 
els down from a root directory STMC. The solution is given in the folder 
Assignment s/a0102_02. 



4. Confidence Intervals and Heapsort 

Let a distribution function F(x) and q, < q < 1 be given. One defines 
q-tiles (also called quantiles or fractiles) x q by means of 



F(x q ) = q. 



(11) 



The median si is often (certainly not always) the typical value of the 
random variable x r . 
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Example: For the normal distribution the precise probability content of 
the confidence intervals 

[Xg,xi- q ] = [— na, na] for n = 1,2 

is p = 1 - 2q = 68.27% for one a and p = 1 - 2q = 95.45% for two a. 
The peaked distribution function 

F '«=(f (I) ,Mr (I) ,7r . < 12 > 

[1 - F[x) for F(x) > |. 

provides a useful way to visualize probability intervals of a distribution. It 
is illustrated in Fig. |2Jfor the Gaussian distribution. 

Sampling provides us with an empirical distribution function and in 
practice the problem is to estimate confidence intervals from the empiri- 
cal data. Assume we generate n random numbers x±, ...,x n independently 
according to a probability distribution F(x). The n random numbers con- 
stitute a sample. We may re-arrange the Xi in increasing order. Denoting 
the smallest value by x ni , the next smallest by x m , etc., we arrive at 



''Til 



< X-K 2 < • • • < x^ n (13) 



where ni, . . . ,ir n is a permutation of 1, . . . , n. Each of the x 7Ti is called an 
order statistic. An estimator for the distribution function F(x) is the 
empirical distribution function 
i 

F(x) = — for x v . < x < x v ... , i = 0, 1, . . . , n — 1, n (14) 
n 

with the definitions x. KQ = — oo and x nrl+1 = +00. 

To calculate F(x) and the corresponding peaked distribution function, 
one needs an efficient way to sort n data values in ascending (or descending) 
order. This is provided by the heapsort, which relies on two steps: First the 
data are arranged in a heap, then the heap is sorted. A heap is a partial 
ordering so that the number at the top is larger or equal than the two 
numbers in the second row, provided at least three numbers Xi exist. More 
details are given in [J]. The computer time needed to succeed with this 
sorting process grows only like n log 2 n, because there are log 2 n levels in 
the heap, see Knuth j!5| for an exhaustive discussion of sorting algorithms. 



5. The Central Limit Theorem and Binning 

How is the sum of two independent random variables 

y r = xl + x r 2 . 



(15) 
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distributed? We denote their probability density of y r by g(y). The corre- 
sponding cumulative distribution function is given by 



G(y) 



fi(xi) /a (2:2) dx x dx 2 



xi+x 2 <y 



fi(x) F 2 {y - x) dx 



where F 2 (x) is the distribution function of the random variable x 2 . We take 
the derivative and obtain the probability density of y r 



g(y) 



dG(y) 
dy 



+ OO 



fi(x) f2{y-x) dx 



(16) 



The probability density of a sum of two independent random variables is 
the convolution of the probability densities of these random variables. 

Example: Sums of uniform random numbers, corresponding to the sums 
of an uniformly distributed random variable x r € (0, 1]: 

(a) Let y r = x r + x r , then 



(b) Let y' 



9a(y) 



y for < y < 1, 
32(2/) = { 2-y for 1 < y < 2, 
elsewhere. 

+ x r ', then 

y 2 /2 for < y < 1, 
(-2y 2 + 6y-3)/2 for 1 < y < 2, 
(y-3) 2 /2 for 2 < y < 3, 
elsewhere. 



(17) 



(18) 



The convolution l|16fl takes on a simple form in Fourier space. In 
statistics the Fourier transformation of the probability density is known 
as characteristic function, defined as the expectation value of e ltx : 



A straightforward calculation gives 

4>(t) = exp 



+00 



e ltx f{x) dx 



la, 
2 N 



x t 2 



(19) 



(20) 



for the characteristic function of the Gaussian probability density J7J. The 
characteristic function is particularly useful for investigating sums of ran- 
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dom variables, y r — x\ + x r 2 : 

(j) y {t) = ( e («*J+i*»S)) (21) 

- oo /* + oo 

e 4tel e^ 2 h{ Xl ) f 2 (x 2 ) dxi dx 2 = cp xi (t) aa (t) . 



— oo •/ — oo 



The characteristic function of a sum of random variables is the 
product of their characteristic functions. The result generalizes im- 
mediately to N random variables y r = x'j + • • • + x r N . The characteristic 
function of y r is 

N 



,(<) = n^w ( 22 ) 



and the probability density of y r is the Fourier back-transformation of this 
characteristic function 

g(y) = 7T / dte-^cPyit) . (23) 

The probability densitiy of the sample mean is obtained as follows: 
The arithmetic mean of y r is x r — y r /N. We denote the probability density 
of y r by gN(y) and the probability density of the arithmetic mean by cin(x). 
They are related by 

g N (x) = Ng N {Nx) . (24) 
This follows by substituting y = Nx into guiv) dy: 



1 = / g N (y)dy= / g N (Nx)2dx = / g N (x)dx. 

J —oo J —oo J — oo 

Fig. |3 illustrates equation (|24|l for the sums of two (|17|l and three (|18|) 
uniformly distributed random variables. This suggests that sampling leads 
to convergence of the mean by reducing its variance. We use the character- 
istic function 4> y {t) = [4>x(t)\ to understand the general behavior. The 
characteristic function for the corresponding arithmetic average is 

/+oo /- + oo / t \ 

dxe ltx g N (x) = J dy exp (i — yj g N (y) . 



Hence, 



<f>x{t) = 4> y 



N 



(25) 
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Fig. 3. Probability densities for the arithmetic means of two and three uniformly dis- 
tributed random variables, g~2(x) and 93(2:), respectively. 



To simplify the equations we restrict ourselves to x — 0. Let us consider a 
probability density f(x) and assume that its moment exists, implying that 
the characteristic function is a least two times differentiable, so that 

Mt) = 1 - y t 2 + °(t 3 ) ■ (26) 

The leading term reflects the the normalization of the probability density 
and the first moment is </>'(0) = x = 0. The characteristic function of the 
mean becomes 



1 



27V 2 



f + 



exp 



1 a. 



x ,2 



2 N 



fl 4 



This is the central limit theorem: The probability density of the arith- 
metic mean x r converges towards the Gaussian probability density with 
variance (compare Eq. J2U) 

a 2 {x r ) 



a\x r ) 



N 



(27) 
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Binning: The notion of binning introduced here should not be confused 
with histogramming. Binning means here that we group NDAT data into 
NBINS bins, where each binned data point is the arithmetic average of 

NBIN = [NDAT/NBINS] (Fortran integer division) 

data points in their original order. Preferably NDAT is a multiple of NBINS. 
The purpose of the binning procedure is twofold: 

(1) When the the central limit theorem applies, the binned data will be- 
come practically Gaussian, as soon as NBIN becomes large enough. This 
allows to apply Gaussian error analysis methods even when the original 
data are not Gaussian. 

(2) When data are generated by a Markov process subsequent events are 
correlated. For binned data these correlations are reduced and can in 
practical applications be neglected, once NBIN is sufficiently large com- 
pared to the autocorrelation time (see section 1171)1 . 

6. Gaussian Error Analysis for Large and Small Samples 

The central limit theorem underlines the importance of the normal distri- 
bution. Assuming we have a large enough sample, the arithmetic mean of a 
suitable expectation value becomes normally distributed and the calculation 
of the confidence intervals is reduced to studying the normal distribution. It 
has become the convention to use the standard deviation of the sample 
mean 

1 - 

a = a{x r ) with x r = —J2< (28) 

i=i 

to indicate its confidence intervals [x — n<x, x + no] (the dependence of a on 
N is suppressed). For a Gaussian distribution equation Eq. (JSJ yields the 
probability content p of the confidence intervals l(2"5|) to be 

1 f+ n _! 2 
p = p{n) = G{na) — G(—rw) = / dx e 2X — erf 

V27T J- n 

In practice the roles of x and x are interchanged: One would like to know the 
likelihood that the unknown exact expectation value x will be in a certain 
confidence interval around the measured sample mean. The relationship 

x G [x — ncr 1 x + na\ x G [x — na, x + na] (30) 

solves the problem. Conventionally, these estimates are quoted as 

x = x±Ax (31) 
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where the error bar Ax is often an estimator of the exact standard 
deviation. 

An obvious estimator for the variance crz, is 



1 N 



where the prime indicates that we shall not be happy with it, because we 
encounter a bias. An estimator is said to be biased when its expectation 
value does not agree with the exact result. In our case 

{(£?) + °l • (33) 

An estimator whose expectation value agrees with the true expectation 
value is called unbiased. The bias of the definition (|32[l comes from re- 
placing the exact mean x by its estimator x r . The latter is a random vari- 
able, whereas the former is just a number. Some algebra |7j shows that the 
desired unbiased estimator of the variance is given by 

AT 1 N 

«) 2 = -W^i^ = — tE«-^) 2 - ( 34 ) 

Correspondingly, the unbiased estimator of the variance of the sample mean 
is 

1 



(4) 2 = W hr ) t(-i--r. (35) 

Gaussian difference test: In practice one is often faced with the 
problem to compare two different empirical estimates of some mean. How 
large must D = x — y be in order to indicate a real difference? The quotient 



D' 



°d = x/4 + 4 (36) 



is normally distributed with expectation zero and variance one, so that 

P = P(\d r \<d) = G (d) - G (-d) = erf(-j|) . (37) 

The likelihood that the observed difference \x — y\ is due to chance 

is defined to be 

Q = l-P = 2G Q {-d) = 1-erfT-^J . (38) 
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If the assumption is correct, then Q is a uniformly distributed random 
variable in the range [0, 1). Examples are collected in tabled Often a 5% 
cut-off is used to indicate a real discrepancy. 



Table 1. Gaussian difference tests (compile and run the program provided in 
ForProc/Gau_dif , which results in an interactive dialogue). 



XI ± CTgj 


1.0 ±0.1 


1.0 ±0.1 


1.0 ±0.1 


1.0 ±0.05 


1.000 ±0.025 


X2 ± Ose 2 


1.2 ±0.2 


1.2 ± 0.1 


1.2 ± 0.0 


1.2 ±0.00 


1.200 ±0.025 


Q 


0.37 


0.16 


0.046 


0.000063 


0.15 x lO -7 



Gosset's Student Distribution: We ask the question: What happens 
with the Gaussian confidence limits when we replace the variance ctJ- by its 
estimator s§- in statements like 

Ifillfl < 1.96 with 95% probability. 

For sampling from a Gaussian distribution the answer was given by Gos- 
set, who published his article 1908 under the pseudonym Student in 
Biometrika [20! • He showed that the distribution of the random variable 



r 



(39) 



is given by the probability density 

1 



/(*) = 



1 



t z 



(40) 



(N- 1)5(1/2, (N- l)/2) V ' iV-1, 

Here B(x,y) is the beta function. The fall-off is a power law |t| for 
\t\ — > co, instead of the exponential fall-off of the normal distribution. 
Some confidence probabilities of the Student distribution are (assignment 
a0203_01): 



N \ S 


1.0000 


2.0000 


3.0000 


4.0000 


5.0000 


2 


.50000 


.70483 


.79517 


.84404 


.87433 


3 


.57735 


.81650 


.90453 


.94281 


.96225 


4 


.60900 


.86067 


.94233 


.97199 


.98461 


8 


.64938 


.91438 


.98006 


.99481 


.99843 


16 


.66683 


.93605 


.99103 


.99884 


.99984 


32 


.67495 


.94567 


.99471 


.99963 


.99998 


64 


.67886 


.95018 


.99614 


.99983 


1.0000 


INITY: 


.68269 


.95450 


.99730 


.99994 


1.0000 
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For N < 4 we find substantial deviations from the Gaussian confidence 
levels, whereas up to two standard deviations reasonable approximations 
of Gaussian confidence limits are obtained for N > 16 data. If desired, 
the Student distribution function can always be used to calculate the exact 
confidence limits. When the central limit theorem applies, we can bin a 
large set of non-Gaussian data into 16 almost Gaussian data to reduce the 
error analysis to Gaussian methods. 

Student difference test: This test is a generalization of the Gaussian 
difference test. It takes into account that only a finite number of events are 
sampled. As before it is assumed that the events are drawn from a normal 
distribution. Let the following data be given 

x calculated from M events, i.e., erf = ct^/M (41) 
y calculated from N events, i.e., o^=a y /N (42) 

and unbiased estimators of the variances are 

s l = s l/M= ^=)[ X ;-* )2 and 4 = 4/JV= ^T^f ~f 2 . (43) 
xl M(M-l) J v N(N-1) v ' 

Under the additional assumption = at the probability 

P(\x-y\>d) (44) 

is determined by the Student distribution function in the same way as the 
probability of the Gaussian difference test is determined by the normal 
distribution. 

Examples for the Student difference test for x\ = 1.00 ± 0.05 from M 
data and X2 = 1.20 ± 0.05 from N data are given in tabled The Gaussian 
difference test gives Q = 0.0047. For M = N = 512 the Student Q value is 
practically identical with the Gaussian result, for M = N = 16 it has almost 
doubled. Likelihoods above a 5% cut-off, are only obtained for M = N = 2 
(11%) and M = 16, JV = 4 (7%). The latter result looks a bit surprising, 
because its Q value is smaller than for M — N = 4. The explanation is that 
for M = 16, N — 4 data one would expect the N = 4 error bar to be two 
times larger than the M = 16 error bar, whereas the estimated error bars 
are identical. This leads to the problem: Assume data are sampled from 
the same normal distribution, when are two measured error bars consistent 
and when not? 
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Table 2. Student difference test for the data ~x\ = 1.00 ± 0.05 
and X2 = 1.20 ± 0.05 (compile and run the program provided in 
ForProc/StucLdif , which results in an interactive dialogue). 



M 


512 


32 


16 


16 


4 


3 


2 


N 


512 


32 


16 


4 


4 


3 


2 


Q 


0.0048 


0.0063 


0.0083 


0.072 


0.030 


0.047 


0.11 



6.1. x 2 Distribution, Error of the Error Bar, F-Test 

The distribution of the random variable 

(x r ) 2 = X>[) 2 , ( 45 ) 

i=l 

where each y[ is normally distributed, defines the x 2 distribution with TV 
degrees of freedom. The study of the variance {s r x ) 2 of a Gaussian sample 
can be reduced to the x 2 -distribution with / = N — 1 degrees of freedom 

W)2= (^i)M£ = f (46) 

The probability density of x 2 P er degree of freedom (pdf) is 

f N (x 2 ) = N f(N x 2 ) = } where a= | . (47) 

The Error of the Error Bar: For normally distributed data the num- 
ber of data alone determines the errors of error bars, because the x 2 distri- 
bution is exactly known. Confidence intervals for variance estimates s 2 x = 1 
from NDAT data (assignment a0204_01) are: 







q 


q 




q 




1-q 




1-q 


NDAT= 


2**K 


.025 


.150 




500 




850 




975 


2 


1 


.199 


.483 


2 


198 


27 


960 


1018 


255 


4 


2 


.321 


.564 


1 


268 


3 


760 


13 


902 


8 


3 


.437 


.651 


1 


103 


2 


084 


4 


142 


16 


4 


.546 


.728 


1 


046 


1 


579 


2 


395 


32 


5 


.643 


.792 


1 


022 


1 


349 


1 


768 


1024 


10 


.919 


.956 


1 


001 


1 


048 


1 


093 


16384 


14 


.979 


.989 


1 


000 


1 


012 


1 


022 
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The variance ratio test or F-test: We assume that two sets of normal 
data are given together with estimates of their variances: ,Nij and 
(s^ 2 , 7V2). We would like to test whether the ratio F — s^ 1 /s^ 2 differs from 
F = 1 in a statistically significant way. The probability (fi/f%)F < w, 
where fi = Ni — 1, i = 1, 2, is known to be 

HH = 1-^(^,1 / aj i/x) • (48) 

Examples are given in table|31 This allows us later to compare the efficiency 
of MC algorithms. 



Table 3. Examples for the F-test (use the program in ForProc/F_test 
or the one in ForProc/F_stud). 



Axi 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 


1.0 




16 


16 


64 


1024 


2048 


32 


1024 


16 


Ax 2 


1.0 


1.0 


1.0 


1.05 


1.05 


2.0 


2.0 


2.0 


N 2 


16 


8 


16 


1024 


2048 


8 


256 


16 


Q 


1.0 


0.36 


0.005 


0.12 


0.027 


0.90 


0.98 


0.01 



6.2. The Jackknife Approach 

Jackknife estimators allow to correct for the bias and the error of the bias. 
The method was introduced in the 1950s (for a review see [?])■ It is rec- 
ommended as the standard for error bar calculations. In unbiased situ- 
ations the jackknife and the usual error bars agree. Otherwise the jackknife 
estimates are improvements. 

The unbiased estimator of the expectation value x is 

1 N 

T — — \ ^ T 

i=l 

Bias problems may occur when one estimates a non-linear function of x: 

1= m ■ (49) 

Typically, the bias is of order 1/N: 

bias (/) = /-</) = | + |L + 0(_L) (50 ) 

where a\ and are constants. But for the biase estimator we lost the ability 
to estimate the variance cr 2 (/) = a 2 (f)/N via the standard equation 

_1 1 N 

S 2 (f) = T7* 2 (/) = WTVr ^ ECA-/) 2 . (51) 



TV w; N(N-l)j-^ 
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because fa — f(xi) is not a valid estimator of /. Further, it is in non- 
trivial applications almost always a bad idea to use linear error propagation 
formulas. Jackknife methods are not only easier to implement, but also more 
precise and far more robust. 

The error bar problem for the estimator / is conveniently overcome by 
using jackknife estimators / , //, defined by 

, N , 

7 J = N^2fi with fi = and x i = ^—\^ Xk ■ (52) 

i—1 k^i 

The estimator for the variance <r 2 (/ ) is 

AT - 1 N 

4(7 J ) = ^VE(//"7 j ) 2 - (53) 

i=l 

Straightforward algebra shows that in the unbiased case the estimator of 
the jackknife variance (|53|l reduces to the normal variance (|51ll . Notably 
only of order N (not N 2 ) operations are needed to construct the jackknife 
averages xf , i = 1, . . . , N from the orginal data. 

7. Statistial Physics and Potts Models 

MC simulations of systems described by the Gibbs canonical ensemble aim 
at calculating estimators of physical observables at a temperature T. In 
the following we choose units so that the Boltzmann constant becomes one, 
i.e. (3 = 1/T. Let us consider the calculation of the expectation value of 
an observable O. Mathematically all systems on a computer are discrete, 
because a finite word length has to be used. Hence, the expectation value 
is given by the sum 

K 

d = d((3) = (0)=Z- 1 Y / (k) e- l3E(k> (54) 

k=l 

K 

where Z = Z{fi) = ^ e^^' (55) 

k=l 

is the partition function. The index k = 1, . . . , K labels the configura- 
tions of the system, and E^> is the (internal) energy of configuration k. 
The configurations are also called microstates. To distinguish the config- 
uration index from other indices, it is put in parenthesis. 

We introduce generalized Potts models in an external magnetic field on 
d-dimensional hypercubic lattices with periodic boundary conditions (i.e., 
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the models are defined on a torus in d dimensions). Without being overly 
complicated, these models are general enough to illustrate the essential 
features we are interested in. In addition, various subcases of these models 
are by themselves of physical interest. 

We define the energy function of the system by 

-P £ (fc) = -p E ( k} + H M< fc > (56) 

where 

^* ) = -2E% w .«! fc) ) + — ( 57 ) 

i. — < j n 

m y 

wit- - M«=2g i( i, 9 f)). 

The sum (ij) is over the nearest neighbor lattice sites and qi k ' is called 
the Potts spin or Potts state of configuration k at site i. For the g-state 

(k) 

Potts model q\ takes on the values 1, . . . , q. The external magnetic field is 
chosen to interact with the state qi — 1 at each site i, but not with the other 
states qi ^ 1. The case q = 2 becomes equivalent to the Ising ferromagnet. 
See F.Y. Wu |2U for a detailed review of Potts models. 
For the energy per spin our notation is 

e s = E/N . (58) 

A factor of two and an additive constant are introduced in Eq. I|57|l , so that 
e s agrees for q — 2 with the conventional Ising model definition, and 

[3 = /3 Isin s = i/3 Potts . (59) 

For the 2d Potts models a number of exact results are known in the infi- 
nite volume limit, mainly due to work by Baxter p^. The phase transions 
temperatures are 

I/3 c P°tt s = p c = i. = iln(l + V?), q = 2,3,... . (60) 

At p c the average energy per state is 

el = E C JN = 1 - 2 - ■ (61) 

The phase transition is second order for q < 4 and first order for q > 5. 
The exact infinite volume latent heats Ae s and entropy jumps As were 
also found by Baxter pQ, while the interface tensions f s were derived later 
(see jH] and references therein). 
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Random Sampling 
Re-weighted to |3=0.2 
MC at (3=0-2 
MC at (3=0.4 




Fig. 4. Energy histograms of 100 000 entries each for the Ising model on a 20 X 20 lattice: 
Random Sampling gives statistically independent configurations at (i = 0. Histograms 
at /3 = 0.2 and f3 = 0.4 are generated with Markov chain MC. Re-weighting of the /3 = 
random configurations to /3 = 0.2 is shown to fail (assignments a0301_02 and a0303_02). 



8. Sampling and Re-weighting 

For the Ising model it is straightforward to sample statistically indepen- 
dent configurations. We simply have to generate N spins, each either up 
or down with 50% likelihood. This is called random sampling. In Fig. 01 
a thus obtained histogram for the 2d Ising model energy per spin is 
depicted. 

Note that is is very important to distinguish the energy measurements 
on single configurations from the expectation value. The expectation value 
e s is a single number, while e s fluctuates. From the measurement of many 
e s values one finds an estimator of the mean, e s , which fluctuates too. 

The histogram entries at f3 = can be re-weighted so that they corre- 
spond to other (3 values. We simply have to multiply the entry corresponding 
to energy E by exp(— j3E). Similarly histograms corresponding to the Gibbs 
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ensemble at some value 0o can be re- weighted to other (3 values. Care has to 
be taken to ensure that the involved arguments of the exponential function 
do not become too large. This can be done by first calculating the mean 
energy and then implementing re-weighting with respect to the difference 
from the mean. 

Re-weighting has a long history. For finite size scaling (FSS) investi- 
gations of second order phase transitions its usefulness has been stressed 
by Ferrenberg and Swendsen |12| (accurate determinations of peaks of the 
specific heat or of susceptibilities). 

In Fig. 0| re-weighting is done from (3q — to (3 = 0.2. But, by com- 
parison to the histogram from a Metropolis MC calculation at (3 — 0.2, the 
result is seen to be disastrous. The reason is easily identified: In the range 
where the (3 — 0.2 histogram takes on its maximum, the (3 = histogram 
has not a single entry. Our random sampling procedure misses the impor- 
tant configurations at [3 = 0.2. Re- weighting to new (3 values works only in 
a range (3q ± A/3, where A/3 — > in the infinite volume limit. 

Important Configurations: Let us determine the important contri- 
butions to the partition function. The partition function can be re-written 
as a sum over energies 

Z = Z(P) = Y / n(E)e~' 3E (62) 

E 

where the unnormalized spectal density n(E) is defined as the number of 
microstates k with energy E. For a fixed value of (3 the energy probability 
density 

Pp{E) = c p n{E)e-P E (63) 

is peaked around the average value E((3), where cp is a normalization con- 
stant determined by J2e Ppi-^) = 1- 

Away from first and second order phase transitions, the width of the 
energy distribution is AE ~ y/V. This follows from the fact that the fluc- 
tuations of the N ~ V lattice spins are essentially uncorrelated, so that the 
magnitude of a typical fluctuations is ~ V/V. As the energy is an extensive 
quantity ~ V, we find that the re-weighting range is A/3 ~ so that 

A/3E ~ vk stays within the fluctuation of the system. 

Interestingly, the re-weighting range increases at a second order phase 
transition point, because critical fluctuations are larger than non-critical 
fluctuations. Namely, one has AE ~ V x with 1/2 < x < 1 and the require- 
ment A(3E - V x yields A/3 - P" 1 . 
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For first order phase transitions one has a latent heat AV^ ~ V, but 
this does not mean that the re-weighting range becomes of order one. In 
essence, the fluctuations collapse, because the two phases become separated 
by an interface. One is back to fluctuations within either of the two phases, 
i.e. A/3 - l/VV. 

The important configurations at temperature T — 1/(3 are at the energy 
values for which the probability density Pp(E) is large. To sample them 
efficiently, one needs a procedure which generates the configurations with 
their Boltzmann weights 

Jk) = eHJB (« _ (64) 

The number of configurations n{E) and the weights combine then so that 
the probability to generate a configuration at energy E becomes precisely 
Pp(E) as given by equation 

9. Importance Sampling and Markov Chain Monte Carlo 

For the canonical ensemble importance sampling generates configura- 
tions k with probability 

P B = c B w y B > = c B e pa (65) 

where the constant c B is determined by the normalization condition 

Sfe -^b = 1- The vector (P B ) is called Boltzmann state. When COnfigU- 
ffc) 

rations are stochastically generated with probability P B , the expectation 
value becomes the arithmetic average: 

, N K 

6 = d(J3) = (O) = km — J2 ® (kn) ■ (66) 

n— 1 

Truncating the sum at some finite value of Nk, we obtain an estimator 
of the expectation value 

_ , N K 

O = Y . (67) 

N K ^ 

Normally, we cannot generate configurations k directly with the probabil- 
ity (|65|) . but they may be found as members of the equilibrium distribu- 
tion of a dynamic process. A Markov process is a particularly simple 
dynamic process, which generates configuration k n+ i stochastically from 
configuration k n , so that no information about previous configurations 
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fc n _i,fe n _2, ■ ■ ■ is needed. The elements of the Markov process time se- 
ries are the configurations. Assume that the configuration k is given. Let 
the transition probability to create the configuration I in one step from k 
be given by W (l){k) = W[k->l}. The transition matrix 

W = (V (()(fc) ) (68) 

defines the Markov process. Note, that this matrix is a very big (never stored 
in the computer), because its labels are the configurations. To generate 
configurations with the desired probabilities, the matrix W needs to satisfy 
the following properties: 

(i) Ergodicity: 

e-? EW > and e~ pEW > imply : (69) 

an integer number n > exists so that (V^™)(')( fe ) > o holds. 

(ii) Normalization: 

J2w (l){k) = 1 . (70) 
i 

(iii) Balance: 

^2 w (DW e -PEW = e -f, E m (71) 

k 

Balance means: The Boltzmann state (|65t is an eigenvector with eigen- 
value 1 of the matrix W = (W^W). 

An ensemble is a collection of configurations for which to each con- 
figuration k a probability is assigned, ^^P^ — 1. The Gibbs or 
Boltzmann ensemble Eb is defined to be the ensemble with the proba- 
bility distribution H65J1 . 

An equilibrium ensemble E eq of the Markov process is defined by its 
probability distribution P eq satisfying 

WP eq = P eq , in components = ^ W {l)[k) P [ e k) . (72) 

k 

Statement: Under the conditions (i), (ii) and (iii) the Boltzmann en- 
semble is the only equilibrium ensemble of the Markov process. 

For a proof the readers is referred to [Jj. There are many ways to con- 
struct a Markov process satisfying (i), (ii) and (iii). A stronger condition 
than balance (|71JI is 
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(iii') Detailed balance: 

W {l){k) e -0E(V = W (k)(l) e -PE«\ (73) 

Using the normalization ^2 k W^^ = 1 detailed balance implies bal- 
ance (iii). 

At this point we have succeeded to replace the canonical ensemble aver- 
age by a time average over an artificial dynamics. Calculating averages over 
large times, like one does in real experiments, is equivalent to calculating 
averages of the ensemble. One distinguishes dynamical universality classes. 
The Metropolis and heat bath algorithms discussed in the following fall 
into the class of so called Glauber dynamics, model A in a frequently used 
classification JOj- Cluster algorithms |5J constitute another universality 
class. 



9.1. Metropolis and Heat Bath Algorithm for Potts Models 

The Metropolis algorithm can be used whenever one knows how to cal- 
culate the energy of a configuration. Given a configuration k, the Metropolis 
algorithm proposes a configuration / with probability 

f(l, k) normalized to ^ k ) = 1 • ( 74 ) 

I 

The new configuration I is accepted with probability 



{I 

1, 



p y 

B 

p(fe) 



1 for < £ (fe) 



(75) 



If the new configuration is rejected, the old configuration has to be counted 
again. The acceptance rate is defined as the ratio of accepted changes 
over proposed moves. With this convention we do not count a move as 
accepted when it proposes the at hand configuration. 

The Metropolis procedure gives rise to the transition probabilities 

W mk) - f{l,k)w (l){k) for l^k (76) 
and W {k){k) = f(k, k) + J2 f(h k ) (1 - w (i)(fe) ) ■ (77) 

Therefore, the ratio (w(O(fc)/w(*0(O) satisfies detailed balance (JZ2J if 

f(l,k) = f(k,l) holds. (78) 
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Otherwise the probability density f(l, k) is unconstrained. So there is an 
amazing flexibility in the choice of the transition probabilities WWW . Also, 
the algorithm generalizes immediately to arbitrary weights. 

The heat bath algorithm chooses a state qt directly with the local 
Boltzmann distribution defined by its nearest neighbors. The state qi can 
take on one of the values 1, . . . , q and, with all other states set, determines 
a value of the energy function l|56|) . We denote this energy by E(qi) and 
the Boltzmann probabilities are 

P B {qi) = const e-P^ (79) 

where the constant is determined by the normalization condition 

E Pb ^ = 1 ■ ( 8 °) 

In equation (|79"|) we can define E(qi) to be just the contribution of the 
interaction of qi with its nearest neighbors to the total energy and absorb 
the other contributions into the overall constant. Here we give a generic 
code which works for arbitrary values of q and d (other implementations 
may be more efficient). 

We calculate the cumulative distribution function of the heat bath prob- 
abilities 

The normalization condition l|80[) implies Phb{<i) = 1- Comparison of these 
cumulative probabilities with a uniform random number x r yields the heat 
bath update qi — ► q[. Note that in the heat bath procedure the original 
value ql n does not influence the selection of gf ew . 

9.2. The 0(3) er Model and the Heat Bath Algorithm 

We give an example of a model with a continuous energy function. Expec- 
tation values are calculated with respect to the partition function 

Z = [Hdsi e-P E « s *» . (82) 




The spins Sj = s< 2 are normalized to (s*j) = 1 (83) 
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1 i-It: 



and the measure dsi is defined by J dsi — — J dcos(8i) 

(84) 

where the polar (9i) and azimuth (jpi) angles define the spin Si on the unit 
sphere. The energy is 

E = -Y, g i g j > ( 85 ) 
(ij) 

where the sum goes over the nearest neighbor sites of the lattice and vecSiSj 
is the dot product of the vectors. The 2d version of the model is of interest 
to field theorists because of its analogies with the four-dimensional Yang- 
Mills theory. In statistical physics the <i-dimensional model is known as the 
Heisenberg ferro magnet (references can be found in [Tj). 

We would like to update a single spin s. The sum of its 2d neighbors is 

S = s\ + s 2 H h s 2 d-\ + s 2 d ■ 

Hence, the contribution of spin s to the energy is 2d — sS. We propose a 
new spin s with the measure (18411 by drawing two uniformly distributed 
random numbers 

<fr r E [0, 2ir) for the azimuth angle and 
cos(# r ) — x r E [—1, +1) for the cosine of the polar angle . 

This defines the probability function f(s , s) of the Metropolis process, 
which accepts the proposed spin s with probability 

J 1 for Ss' > Ss, 
1 e -P(.&-3n') f or Ss < Ss. 



wis 



If sites are chosen with the uniform probability distribution 1/N per 
site, where TV is the total number of spins, it is obvious that the algorithm 
fulfills detailed balance. It is noteworthy that the procedure remains valid 
when the spins are chosen in the systematic order 1, . . . , N. Balance (I71f) 
still holds, whereas detailed balance (|73|) is violated (an exercise of Ref. [7])- 

One would prefer to choose s directly with the probability 

W(s^s') = P(s';S) = const e p s "' 3 . 

The heat bath algorithm creates this distribution. Implementation of it 
becomes feasible when the energy function allows for an explicit calculation 
of the probability P(s ; S). This is an easy task for the 0(3) er-model. Let 

a = angle(s ,S), x — cos(a) and S — (3\S\ . 
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For S = a new spin s is simply obtained by random sampling. We assume 
in the following 5 > 0. The Boltzmann weight becomes exp(xiS) and the 
normalization constant follows from 

f+i 



f dx e xS = ~ sinh(S) 



Therefore, the desired probability is 

P &'>^ = 9 . S uq ^ xS =--f( x ) 
2 smh(b) 

and the method of Eq. (JSJ) can be used to generate events with the prob- 
ability density f{x). A uniformly distributed random number y r £ [0, 1) 
translates into 

x r = cos a r = 1 In [ exp(+S') - y r exp(+S') + t/ r exp(-S')] . (86) 

Finally, one has to give s a direction in the plane orthogonal to S. This 
is done by choosing a random angle (3 r uniformly distributed in the range 
< (3 r < 27r. Then, x r = cosa r and f3 r completely determine s with 
respect to S. Before storing s in the computer memory, we have to cal- 
culate coordinates of s with respect to a Cartesian coordinate system, 
which is globally used for all spins of the lattice. This amounts to a linear 
transformation. 

9.3. Example Runs 

Start and equilibration: Under repeated application of one of our up- 
dating procedures the probability of states will approach the Boltzmann 
distribution. However, initially we have to start with a microstate which 
may be far off the Boltzmann distribution. Suppression factors like 10~ 10000 
are well possible. Although the weight of states decreases with 1 jn where 
n is the number of steps of the Markov process, one should exclude the ini- 
tial states from the equilibrium statistics. In practice this means we should 
allow for a certain number of sweeps nequi to equilibrate the system. One 
sweep updates each spin once or once in the average. 

Many ways to generate start configurations exist. Two natural and easy 
to implement choices are: 

(1) Generate a random configuration corresponding to f3 — 0. This defines 
a random or disordered start of a MC simulation. 

(2) Generate a configuration for which all Potts spins take on the same 
q- value. This is called an ordered start of a MC simulation. 
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-0.5 



-1.5 
-2 

50 100 150 200 

Sweeps 

Fig. 5. Two Metropolis time series of 200 sweeps each for a 2d Ising model on a 80 X 80 
lattice at /3 = 0.4 are shown. Random updating for which the positions of the spins are 
chose with the uniform probability distribution was used. Measurements of the energy 
per spin after every sweep are plotted for ordered and disordered starts. The exact mean 
value e s = —1.10608 is also indicated (assignment a0303_01). 

Examples of initial time series are given in Fig. and El Unless explic- 
itly stated otherwise, we use here and in the following always sequential 
updating, for which the spins are touched in a systematic order. 

Consistency Checks: For the 2c? Ising model we can test against the 
exact finite lattice results of Ferdinand and Fisher JI] . We simulate a 20 2 
lattice at /3 = 0.4, using a statistics of 10 000 sweeps for reaching equilib- 
rium. The statistics for measurement is chosen to be 64 bins of 5 000 sweeps 
each. The number 64 is taken, because according to the student distribution 
the approximation to the Gaussian distribution is then excellent, while the 
binsize of 5 000 200) is argued to be large enough to neglect correlations 
between the bins. A more careful analysis is the subject of our next section. 



Random Start 
Ordered Start 
Exact 
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50 100 150 200 



Sweeps 

Fig. 6. q = 10 Potts model time series of 200 sweeps on a 80 X 80 lattice at fi = 0.62. 
Measurements of the energy per spin after every sweep are plotted for ordered and 
disordered starts (assignment a0303_05). 

With our statistics we find (assignment a0303_06) 

e s = -1.1172 (14) (Metropolis) versus e s = -1.117834 (exact) . (87) 

The Gaussian difference test gives a perfectly admissible value, Q = 0.66. 

For the 2d 10-state Potts model at f3 = 0.62 we test our Metropolis 
versus our heat bath code on a 20 x 20 lattice. For the heat bath updating 
we use the same statistics as for the 2d Ising model. For the Metropolis 
updating we increase these numbers by a factor of four. This increase is 
done, because we expect the performance of Metropolis updating for the 
10-state model to be worse than for the 2-state model: At low temperature 
the likelihood to propose the most probable (aligned) Potts spin is 1/2 for 
the 2-state model, but only 1/10 for the 10-state model, and (i — 0.62 is 
sufficiently close to the ordered phase, so that this effect is expected to 
be of relevance. The results of our simulations are (assignment a0303_08) 
e s = -0.88709 (30) (Metropolis) versus e s = -0.88664 (28) (heat bath) 
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Fig. 7. Histogram of the energy per spin for the 3d 3-state Potts model on a 24 3 latticed 
at P = 0.275229525 (assignment a0303_10). 

and Q = 0.27 for the Gaussian difference test. Another perfectly admissable 
value. 

To illustrate features of a first order phase transition for the 3d 3-state 
Potts model, we use the 1-hit Metropolis algorithm on a 24 3 lattice and sim- 
ulate at f3 = 0.275229525. We perform 20 000 sweeps for reaching equilib- 
rium, then 64 x 10 000 sweeps with measurements. From the latter statistics 
we show in Fig.0the energy histogram and its error bars. The histogram ex- 
hibits a double peak structure, which is typically obtained when systems 
with first order transitions are simulated on finite lattices in the neigh- 
borhood of so called pseudo-transition temperatures. These are finite 
lattice temperature definitions, which converge with increasing system size 
towards the infinite volume transition temperature. Equal heights of the 
maxima of the two peaks is one of the popular definition of a pseudo- 
transition temperature for first order phase transitions. Equal weights (ar- 
eas under the curves) is another, used in the lecture by Prof. Landau. Our 
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Fig. 8. Peaked distribution functions for the 0(3) tr-model mean energy per link on 
various lattices at j3 = 1.1 (assignment a0304_08). 



value needs to be re-weighted to a slightly higher value to arrange for equal 
heights (assignment a0303_10). Our mean energy per spin, corresponding 
to the histogram of the figure is e s = —1.397 (13). Due to the double peak 
structure of the histogram the error bar is relatively large. Still, the cen- 
tral limit theorem works and a Kolmogorov test shows that our statistics 
is large enough to create an approximately Gaussian distribution for the 
binned data (assignment a0303_ll). 

Self- Averaging Illustration for the 0(3) model: We compare in 
Fig. IHlthe peaked distribution function of the mean energy per link e/ for 
different lattice sizes. The property of self- averaging is observed: The 
larger the lattice, the smaller the confidence range. The other way round, 
the peaked distribution function is very well suited to exhibit observables 
for which self-averaging does not work, as for instance encountered in spin 
glass simulations 0. 
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10. Statistical Errors of Markov Chain Monte Carlo Data 

In large scale MC simulation it may take months, possibly years, to collect 
the necessary statistics. For such data a thorough error analysis is a must. 
A typical MC simulation falls into two parts: 

(1) Equilibration: Initial sweeps are performed to reach the equilibrium 
distribution. During these sweeps measurements are either not taken 
at all or they have to be discarded when calculating equilibrium expec- 
tation values. 

(2) Data Production: Sweeps with measurements are performed. Equi- 
librium expectation values are calculated from this statistics. 

A rule of thumb is: Do not spend more than 50% of your CPU 
time on measurements! The reason for this rule is that one cannot be 
off by a factor worse than two (y/2 in the statistical error). 

How many sweeps should be discarded for reaching equilibrium? In a 
few situations this question can be rigorously answered with the Coupling 
from the Past method (see the article by W. Kendall in this volume). The 
next best thing to do is to measure the integrated autocorrelation time 
and to discard, after reaching a visually satisfactory situation, a number of 
sweeps which is larger than the integrated autocorrelation time. In practice 
even this can often not be achieved. 

Therefore, it is re-assuring that it is sufficient to pick the number of 
discarded sweeps approximately right. With increasing statistics the con- 
tribution of the non-equilibrium data dies out like 1/N, where N is the 
number of measurements. This is eventually swallowed by the statistical 
error, which declines only like l/VN. The point of discarding the equilib- 
rium configurations is that the factor in front of 1/N can be large. 

There can be far more involved situations, like that the Markov chain 
ends up in a metastable configuration, which may even stay unnoticed (this 
tends to happen in complex systems like spin glasses or proteins). 

10.1. Autocorrelations 

We like to estimate the expectation value / of some physical observable. We 
assume that the system has reached equilibrium. How many MC sweeps are 
needed to estimate / with some desired accuracy? To answer this question, 
one has to understand the autocorrelations within the Markov chain. 
Given is a time series of N measurements from a Markov process 

fi = f( Xi ), i = l,...,N, (88) 
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where Xi are the configurations generated. The label i — 1, . . . , N runs in 
the temporal order of the Markov chain and the elapsed time (measured 
in updates or sweeps) between subsequent measurements fi, fi + \ is always 
the same. The estimator of the expectation value / is 



(89) 



With the notation 



t=\i-j\ 

the definition of the autocorrelation function of the observable / is 
C(t) = = < (fi-(fi)) (£-</;» ) = (m-ifi) (fi) = (fo.ft)-f 2 (90) 

where we used that translation invariance in time holds for the equilibrium 
ensemble. The asymptotic behavior for large t is 



C{t) ~ exp for t 



i cxp 



(91) 



where r oxp is called (exponential) autocorrelation time and is related 
to the second largest eigenvalue Ai of the transition matrix by r exp = — In Ai 
under the assumption that / has a non-zero projection on the corresponding 
eigenstate. Superselection rules are possible so that different autocorrelation 
times reign for different operators. 

The variance of / is a special case of the autocorrelations <|9(J[) 



C(0) = a 2 (/). 



(92) 



Some algebra |7j shows that the variance of the estimator / 189fl for the 
mean and the autocorrelation functions l|90() are related by 



- 2 (7) 



N 



N-l 



with c{t) 



C(t) 

6(0) 



(93) 



This equation ought to be compared with the corresponding equation for 
uncorrelated random variables <J 2 (f) = <r 2 (f)/N . The difference is the fac- 
tor in the bracket of (|93|l . which defines the integrated autocorrelation 
time 



N-l 

i=i 



i-ffi*) 



(94) 



February 2, 2008 Master Review Vol. 9in x 6in - (for Lecture Note Series, IMS, NUS) 



article 



Markov Chain Monte Carlo Simulations and their Statistical Analysis 33 

For correlated data the variance of the mean is by the factor r; nt larger 
than the corresponding naive variance for uncorrelated data: 

" - with^ aive = ^. (95) 



a 2 ff\ "naive jy 

naive W / 

In most simulations one is interested in the limit N — » oo and equation (|94|) 
becomes 

oo 

Tint = 1+2 £ c(t) . (96) 
t=i 

The numerical estimation of the integrated autocorrelation time faces dif- 
ficulties. Namely, the variance of the N — > oo estimator of Tj n t diverges: 



r int = l + 2j2~c(t) and cr 2 (r int ) -» oo , (97) 
t=i 

because for large t each c(i) adds a constant amount of noise, whereas the 
signal dies out like exp(— t/r cxp ). To obtain an estimate one considers the 
i-dependent estimator 

t 

r int (t) = 1 + 2 5] c(t') (98) 
t'=i 

and looks out for a window in t for which T- m t(i) is flat. 

To give a simple example, let us assume that the autocorrelation func- 
tion is governed by a single exponential autocorrelation time 

C{t) = const exp ( — ] . (99) 



In this case we can carry out the sum (|96|l for the integrated autocorrelation 
function and find 

oo — 1/t 

r lnt = l + 25>-*/^ = l+ 6 (100) 

For a large exponential autocorrelation time r cxp S> 1 the approximation 

2 e~ 1 / Tcxp 2 — 2/t 

7mt = l + - 77— = 1+ , / CXP =2t cxp -1 S 2t oxp (101) 

1 - e i / T «p l/T 0X p 

holds. 
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10.2. Integrated Autocorrelation Time and Binning 

Using binning the integrated autocorrelation time can also be estimated via 
the variance ratio. We bin the time series l|88|) into N bs < N bins of 



N h = NBIN = 



N 



NDAT 



NBINS 



(102) 



data each. Here [.] stands for Fortran integer division, i.e., Nb = NBIN is 
the largest integer < N/Nb s , implying N ba ■ Nb < N. It is convenient to 
choose the values of N and Nb s so that N is a multiple of Nb s . The binned 
data are the averages 

if* = TT Yl fi tor j = l,...,N bs . (103) 



pN b _ 

' j N b ^ 

i=l + (j-l)N b 



For Nb > r exp the autocorrelations are essentially reduced to those between 
nearest neighbor bins and even these approach zero under further increase 
of the binsize. 

For a set of Nb s binned data /• *, (J = 1, . . . , Nb s ) we may calculate the 
mean with its naive error bar. Assuming for the moment an infinite time 
series, we find the integrated autocorrelation time (|95|l from the following 
ratio of sample variances 

^ = <t with r - = (^r) ■ (104) 

In practice the Nb — > oo limit will be reached for a sufficiently large, finite 
value of Ni,. The statistical error of the T m t estimate (|104f) is, in the first 
approximation, determined by the errors of sl Nb . The typical situation is 
then that, due to the central limit theorem, the binned data are approxi- 
mately Gaussian, so that the error of Sj Nb is analytically known from 

the x 2 distribution. Finally, the fluctuations of Sj of the denominator give 
rise to a small correction which can be worked out \7\ . 

Numerically most accurate estimates of Tint are obtained for the finite 
binsize N b which is just large enough that the binned data (|103|l are prac- 
tically uncorrelated. While the Student distribution shows that the con- 
fidence intervals of the error bars from 16 uncorrelated normal data are 
reasonable approximations to those of the Gaussian standard deviation, 
about 1000 independent data are needed to provide a decent estimate of 
the corresponding variance (at the 95% confidence level with an accuracy 
of slightly better than 10%). It makes sense to work with error bars from 
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16 binned data, but the error of the error bar, and hence a reliable estimate 
of ri n t, requires far more data. 

10.3. Illustration: Metropolis generation of normally 
distributed data 

We generate normally distributed data according to the Markov process 

x' = x + 2ax r ~a (105) 

where x is the event at hand, x r a uniformly distributed random number 
in the range [0, 1), and the real number a > is a parameter which relates 
to the efficiency of the algorithm. The new event x' is accepted with the 
Metropolis probability 

, fl for x 12 <x 2 ; * . 

[exp[-(x' 2 - ar)/2] for x' 2 > x 2 . 

If x' is rejected, the event x is counted again. The Metropolis process in- 
troduces an autocorrelation time in the generation of normally distributed 
random data. 

We work with N = 2 17 = 131072 data and take a = 3 for the Markov 
process (|1U5|1 . what gives an acceptance rate of approximately 50%. The 
autocorrelation function of this process is depicted in Fig. [5] (assignment 
a0401_01). The integrated autocorrelation time (assignment a0401_02) is 
shown in Fig. EH We compare the estimators with the direct estimators 
T int (t) at 

t = N b - 1 . (107) 

With this relation the estimators agree for binsize Nb = 1 and for larger 
Nb the relation gives the range over which we combine data into either 
one of the estimators. The approach of the binning procedure towards the 
asymptotic T ln t value is slower than that of the direct estimate of Tint- 

For our large NDAT = 2 21 data set Ti nt (t) reaches its plateau before t = 
20. All the error bars within the plateau are strongly correlated. Therefore, 
it is not recommendable to make an attempt to combine them. Instead, 
it is save to pick an appropriate single value and its error bar as the final 
estimate: 

Tint = T int (20) = 3.962 ± 0.024 from 2 21 = 2, 097, 152 data. (108) 

The binning procedure, on the other hand, shows an increase ofr^ all the 
way to Nb = 2 7 = 128, where the estimate with the one confidence level 
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Fig. 9. The autocorrelation function 1901 of a Metropolis time series for the normal 
distribution (upper data) in comparison with those of our Gaussian random number 
generator (lower data). For t > 11 the inlay shows the autocorrelations on an enlarged 
ordinate. The straight lines between the data points are just to guide the eyes. The 
curves start with C(0) ~ 1 because the variance of the normal distribution is one. 

error bounds is 

3.85 < r^f < 3.94 from 2 14 = 16, 384 bins from 2 21 data. (109) 

How many data are needed to allow for a meaningful estimate of the 
integrated autocorrelation time? 

For a statistics of NDAT = 2 17 the autocorrelation signal disappears for 
t > 11 into the statistical noise. Still, there is clear evidence of the hoped 
for window of almost constant estimates. A conservative choice is to take 
t = 20 again, which now gives 

Tint = r int (20) = 3.86 ± 0.11 from 2 17 data. (110) 

Worse is the binning estimate, which for the 2 17 data is 

3.55 < < 3.71 from 2 12 = 4, 096 bins from 2 17 = 131, 072 data. (Ill) 
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Fig. 10. The upper curves in the figure and its inlays display the estimators obtained 
by direct calculation. The lowest curve is for the Gaussian random number generator. 
The remaining curves are binning procedure estimators of the integrated autocorrelation 
time with one standard deviation bounds. The main figure relies on 2 21 data and depicts 
estimators up to t = 127. The first inlay relies on 2 17 data and depicts estimators up to 
t = 31. The second inlay relies on 2 14 data and depicts estimators up to t = 15. 

Our best value 1(108|) is no longer covered by the two standard deviation 
zone. 

For the second inlay the statistics is reduced to NDAT = 2 14 . With the 
integrated autocorrelation time rounded to 4, this is 4096 times Tint- For 
binsize N b = 2 4 = 16 we are then down to Nbs = 1024 bins, which are 
needed for accurate error bars of the variance. To work with this number 
we limit, in accordance with equation 1)107(1 . our T ln t(t) plot to the range 
t < 15. Still, we find a quite nice window of nearly constant T- lnt (t), namely 
all the way from t = 4 to t = 15. By a statistical fluctuation (assignment 
a0401_03) Ti n t(t) takes its maximum value at t = 7 and this makes Ti n t(7) = 
3.54±0.13 a natural candidate. However, this value is inconsistent with our 
best estimate 1(108(1 . The true Unt{t) increases monotonically as function of 
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t, so we know that the estimators have become bad for t > 7. The error bar 
at t = 7 is too small to take care of our difficulties. One may combine the 
t = 15 error bar with the t = 7 estimate. In this way the result is 

r int = 3.54 ± 0.21 for 2 14 = 16, 384 data, (112) 

which achieves consistency with ()108(l in the two error bar range. For binsize 
Nb = 16 the binning estimate is 

2.93 < T; 1 ^ < 3.20 from 2 10 = 1, 024 bins from 2 14 data. (113) 

Clearly, the binsize Nb — 16 is too small for an estimate of the integrated 
autocorrelation time. We learn that one needs a binsize of at least ten times 
the integrated autocorrelation time T- lnt , whereas for its direct estimate it 
is sufficient to have t about four times larger than Tint. 

11. Self-consistent versus reasonable error analysis 

By visual inspection of the time series, one may get an impression about 
the length of the out-of-equilibrium part of the simulation. On top of this 
one should still choose 

nequi > T int , (114) 

to allow the system to settle. That is a first reason, why it appears neces- 
sary to control the integrated autocorrelation time of a MC simulations. A 
second reason is that we have to control the error bars of the equilibrium 
part of our simulation. Ideally the error bars are calculated as 

A/ = V^T) with a 2 [f) = r int . (115) 

This constitutes a self-consistent error analysis of a MC simulation. 

However, the calculation of the integrated autocorrelation time may 
be out of reach. Many more than the about twenty independent data are 
needed, which according to the Student distribution are sufficient to esti- 
mate mean values with reasonably reliable error bars. 

In practice, one has to be content with what can be done. Often this 
means to rely on the binning method. We simply calculate error bars 
of our ever increasing statistics with respect to a fixed number of 

NBINS > 16 . (116) 

In addition, we may put 10% of the initially planned simulation time away 
for reaching equilibrium. A-posteriori, this can always be increased. Once 
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Fig. 11. Comparison of the integrated autocorrelation time of the Metropolis process 
with random updating versus sequential updating for the d = 2 Ising model at (3 = 0.4 
(assignment a0402_01 B). The ordering of the curves is identical with the ordering of the 
labels in the figure. 

the statistics is large enough, our small number of binned data become 
effectively independent and our error analysis is justified. 

How do we know that the statistics has become large enough? In practi- 
cal applications there can be indirect arguments, like FSS estimates, which 
tell us that the integrated autocorrelation time is in fact (much) smaller 
than the achieved bin length. This is no longer self-consistent, as we perform 
no explicit measurement of T m t, but it is a reasonable error analysis. 

12. Comparison of Markov chain MC algorithms 

Is the 1-hit Metropolis algorithm more efficient with sequential updating 
or with random updating? For 2d Ising lattices at /3 = 0.4 Fig. illus- 
trates that sequential updating wins. This is apparently related to the fact 
that random updating may miss out on some spins for some time, whereas 
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Fig. 12. One-hit Metropolis algorithm with sequential updating: Lattice size depen- 
dence of the integrated autocorrelation time for the d = 2 Ising model at /3 = 0.4 
(assignment a0402_01 A). The ordering of the curves is identical with the ordering of the 
labels in the figure. 

sequential updating touches each spin with certainty during one sweep. 

Figures El and El illustrate 2d Ising model simulations off and on the 
critical point. Off the critical point, at (3 — 0.4, the integrated autocorre- 
lation time increases for L = 5, 10 and 20. Subsequently, it decreases to 
approach for L — > oo a finite asymptotic value. On the critical point, at 
(3 — (3 C = ln(l + \/2)/2, critical slowing down is observed, an increase 
Tint ~ L z with lattice size, where z w 2.17 is the dynamical critical ex- 
ponent, of the 2d Ising model. Estimates of z are compiled in the book by 
Landau and Binder |lf>| . 

Using another MC dynamics the critical slowing down can be overcome. 
Fig. 1141 shows the major improvements for Swendsen-Wang [2T] (SW) and 
Wolff (21 (W) cluster updating. 

Finally, Fig. El exhibit the improvements of heat bath over Metropolis 
updating for the 10-state d = 2 Potts model at (3 = 0.62. 
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Fig. 13. One-hit Metropolis algorithm with sequential updating: Lattice size depen- 
dence of the integrated autocorrelation time for the d = 2 Ising model at its critical 
temperature (assignment a0402_02 D). The ordering of the curves is identical with the 
ordering of the labels in the figure. 

13. Multicanonical Simulations 

One of the questions which ought to be addressed before performing a 
large scale computer simulation is "What are suitable weight factors for the 
problem at hand?" So far we used the Boltzmann weights as this appears 
natural for simulating the canonical ensemble. However, a broader view of 
the issue is appropriate. 

Conventional, canonical simulations calculate expectation values at a 
fixed temperature T and can, by re-weighting techniques, only be extrapo- 
lated to a vicinity of this temperature. For multicanonical simulations this is 
different. A single simulation allows to obtain equilibrium properties of the 
Gibbs ensemble over a range of temperatures. Of particular interest are two 
situations for which canonical simulations do not provide the appropriate 
implementation of importance sampling: 
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Fig. 14. Estimates of integrated autocorrelation times from simulations of the d = 2 
Ising model at the critical temperature (3 C = 0.44068679351 (assignment a0503_05). 



(1) The physically important configurations are rare in the canonical en- 
semble. 

(2) A rugged free energy landscape makes the physically important config- 
urations difficult to reach. 

MC calculation of the interface tension of a first order phase transition 
provide an example for the first situation. Let N — L d be the lattice size. For 
first order phase transition pseudo-transition temperatures (3 C (L) exist 
so that the energy distributions P(E) = P(E; L) become double peaked 
and the maxima at -E^ ax < ^max are °f equal height P ma x = -P(-E'max) = 
P (£?max)- In-between the maximum values a minimum is located at some 
energy E m i n . Configurations at E m i D are exponentially suppressed like 



Prr 



P(E mia ) = c f LPeM-.f S A) 



(117) 



where f s is the interface tension and A is the minimal area between the 
phases, A = 2L d ~ 1 for an L d lattice, cj and p are constants (computations 
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Fig. 15. Systematic updating: Comparison of the integrated autocorrelation times of 
the 1-hit and 2-hit Metropolis algorithms and the heat bath algorithm for the 10-statc 
Potts model onLxL lattices at j3 = 0.62 (assignment a0402_06). The L = 40 and L = 80 
curves lie almost on top of one another. 



of p have been done in the capillary- wave approximation). The interface 
tension can be determined by Binder's histogram method One has to 
calculate the quantities 

f'(L) = --^]nR(L) with R(L) = ^^ (118) 

and to make a FSS extrapolation of f s (L) for L — ► oo. 

For large systems a canonical MC simulation will practically never visit 
configurations at energy E = E m j a and estimates of the ratio R(L) will be 
very inaccurate. The terminology supercritical slowing down was coined 
to characterize such an exponential deterioration of simulation results with 
lattice size. 

Multicanonical simulations 3 approach this problem by sampling, 
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in an appropriate energy range, with an approximation to the weights 

^/n(E (k) ) = ^yy = exp [-b ( E ^) E + a (119) 

where n(E) is the number of states of energy E. The function b(E) defines 
the inverse microcanonical temperature and a(E) the dimensionless, 
microcanonical free energy. The function b(E) has a relatively smooth 
dependence on its arguments, which makes it a useful quantity when dealing 
with the weight factors. 

Instead of the canonical energy distribution P(E), one samples a new 
multicanonical distribution 

( niu 

(E) « c mu . (120) 
The desired canonical probability density is obtained by re-weighting 

= _gg- Pmu( - E \ e~ 0E . (121) 

This relation is rigorous, because the weights w mu {E) used in the simulation 
are exactly known. Accurate estimates of the interface tension (|1 181) become 
possible. 

The multicanonical method requires two steps: 

(1) Obtain a working estimate w m u{k) of the weights W\i n (K). Working 
estimate means that the approximation to l|119|) has to be good enough 
to ensure movement in the desired energy range. 

(2) Perform a Markov chain MC simulation with the fixed weights w mu (k). 
The thus generated configurations constitute the multicanonical en- 
semble. Canonical expectation values are found by re-weighting to the 
Gibbs ensemble and jackknife methods allow reliable error estimates. 

It is a strength of computer simulations that one can generate artificial 
(not realized by nature) ensembles, which enhance the probabilities of rare 
events one may be interested in, or speed up the dynamics. Nowadays Gen- 
eralized Ensembles (umbrella, multicanonical, 1/k, ...) have found many 
applications. Besides for first order phase transitions they are in particular 
usefull for complex systems such as biomolecules, where they accelerate the 
dynamics. For a review see |14| . 



13.1. How to get the Weights? 

To get the weights is at the heart of the method. Some approaches are: 
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(1) Overlapping, constrained (microcanonical) MC simulations. A poten- 
tial problem is to fulfill ergodicity. 

(2) FSS Estimates. This appears to be best when it works, but there may 
be no FSS theory. 

(3) General Purpose Recursions. Problem: They tend to deteriorate with 
increasing lattice size (large lattices). 

The Multicanonical Recursion (a variant of 0]): The multicanonical 
parameterization of the weights is 

w(a) = e- s ^ =e~HE a )E a+ a(E a) 5 
where (for e being the smallest energy stepsize) 

b(E) = [S(E + e) - S(E)} /e and a(E - e) = a(E) + [b(E - e) - b{E)] E . 
The recursion reads then (see for details): 

b n+1 (E) = b n {E) + g£(E) [\nH n {E + e) - hxH n (E)]/e 

gZ(E)=gZ(E)/[g n (E)+gZ(E)}, 

g%(E) =H n (E + e) H n (E) / [if™ (E + e) + H n (E)\ , 

g n+1 (E)=g n (E)+gZ(E), g°(E)=0. 

The Wang-Landau Recursion 23 : Updates are performed with 
estimators g(E) of the density of states 

p(E 1 -» E 2 ) = min ■ 

Each time an energy level is visited, the estimator of g{E) is updated ac- 
cording to 

g(E)^g(E)f 

where, initially, g(E) = 1 and f = fa = e 1 ■ Once the desired energy range 
is covered, the factor / is refined: 

fl = \/l > fn+l = V fn+l 

until some value very close to one like / = 1.00000001 is reached. Afterwards 
the usual multicanonical production runs may be carried out. 
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14. Multicanonical Example Runs (2d Ising and Potts 
models) 

Most illustrations of this section are from Ref. 6 . 

For an Ising model on a 20 x 20 lattice the multicanonical recursion is 
run in the range 

namin = 400 < iact < 800 = namax . (122) 

The recursion is terminated after a number of so called tunneling events. 
A tunneling event is defined as an updating process which finds its way 
from 

iact = namin to iact = namax and back . (123) 

This notation comes from applications to first order phase transitions. An 
alternative notation for tunneling event is random walk cycle. For most 
applications 10 tunneling events lead to acceptable weights. 

For the Ising model example run we find the requested 10 tunneling 
events after 787 recursions and 64,138 sweeps (assignment a0501_01). In 
assignment a0501_02 a similar example run is performed for the 2d 10- 
state Potts model. 

Performance: If the multicanonical weighting would remove all rele- 
vant free energy barriers, the behavior of the updating process would be- 
come that of a free random walk. Therefore, the theoretically optimal 
performance for the second part of the multicanonical simulation is 

Ttun ~ V 2 . (124) 

Recent work about first order transitions by Neuhaus and Hager JT5J shows 
that the multicanonical procedure removes only the leading free energy 
barrier, while at least one subleading barrier causes a residual supercritical 
slowing done. Up to certain medium sized lattices the behavior V 2+e gives 
a rather good effective description. For large lattices exponential slowing 
down dominates again. The slowing down of the weight recursion with the 
volume size is expected to be even (slightly) worse than that of the second 
part of the simulation. 

Re- Weighting to the Canonical Ensemble: Let us assume that 
we have performed a multicanonical simulation which covers the energy 
histograms for a temperature range 

Anin < = ^ < Anax • (125) 
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Given the multicanonical time series, where i — 1, .. . ,n labels the gener- 
ated configurations, the formula 

E » =i gffl exp + fc^W) _ Q(£ ;W)] 

£^ 1 exp[-/3£» + 6(£W)#W-o(£(9)] ' 1 j 

replaces the multicanonical weighting of the simulation by the Boltzmann 
factor. The denominator differs from the partition function Z by a constant 
factor which drops out. 

For discrete systems it is sufficient to keep histograms when only func- 
tions of the energy are calculated. For an operator C?W = /(.EW) equa- 
tion l|12tj|l simplifies to 

-j_ Ee /Cg) exp [-g g + b(E) E - a(E)} 

E E hmu(E) exp[-f3E + b{E)E-a(E)} [ ' 

where h mu (E) is the histogram sampled during the multicanonical produc- 
tion run and the sums are over all energy values for which h mu (E) has 
entries. 

The computer implementation of these equations requires care. The 
differences between the largest and the smallest numbers encountered in 
the exponents can be really large. We can avoid large numbers by dealing 
only with logarithms of sums and partial sums. For C — A + B with A > 
and B > we can calculate InC = In (.A + B) from the values In A and 
lni3, without ever storing either A or B or C (see for more details): 

min(yl, B) 



InC = In 



max(A, B) 1 



(128) 



max(v4, B) 

= max (In A, In B) + ln{l + exp [min(ln A, In A) - max(ln AAnB)]} 
= max(lnA,lnB) + ln{l + exp [-| InA - lnB\}} . 



14.1. Energy and Specific Heat Calculation 

We are now ready to produce multicanonical data for the energy per spin 
of the 2d Ising model on a 20 x 20 lattice (assignment a0501_03). The same 
numerical data allow to calculate the specific heat defined by 

C = ^| = /3 2 ((£ 2 )-(i?) 2 ) . (129) 

The comparison of the multicanonical specific heat data with the exact 
curve of Ferdinand and Fisher |1 lj is shown in Fig. 1161 (error bars rely on 
the jackknife method). 
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Fig. 16. Specific heat per spin for the Ising model on a 20 X 20 lattice: Multicanonical 
data versus exact results of Ferdinand and Fisher. This figure was first published in [6]. 

The energy histogram of this multicanonical simulation together its 
canonically re-weighted descendants at ft = 0, /? = 0.2 and (3 — 0.4 is shown 
in Fig. 1171 The normalization of the multicanonical histogram is adjusted 
so that it fits into the same figure with the three re-weighted histograms. 

It is assignment a0501_06 to produce similar data for the 2d 10-state 
Potts model and to re- weighted the multicanonical energy histogram to the 
canonical distribution at f3 = 0.71, which is close to the pseudo-transition 
temperature. The multicanonical method allows then to estimate the inter- 
face tension of the transition by following the minimum to maximum ratio 
R(L) of Eq. (|118fl over many orders of magnitude [3] as is shown in Fig. 1181 



14.2. Free Energy and Entropy Calculation 

At (3 = the Potts partition function is Z = q N . Therefore, multicanonical 
simulations allow for proper normalization of the partition function, if [3 — 
is included in the temperature range. The properly normalized partition 
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Fig. 17. Energy histogram from a multicanonical simulation of the Ising model on a 
20 X 20 lattice together with canonically re- weighted histograms (assignment a0501_04). 
This figure was first published in [6]. 

function allows to calculate the Helmholtz free energy 

F = -/r 1 hx{Z) (130) 

and the entropy 

S = = P(F-E) (131) 

of the canonical ensemble. Here E is the expectation value of the internal 
energy and the last equal sign holds because of our choice of units for the 
temperature. For the 2d Ising model as well as for the 2d 10-state Potts 
model, we show in Fig. 1191 multicanonical estimates of the entropy density 
per site 

s = S/N . (132) 

For the 2d Ising model one may also compare directly with the number 
of states. Up to medium sized lattices this integer can be calculated to all 
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Fig. 18. Energy histogram, e s = —2s + 2, for the 2d 10-state Potts models on various 
lattice sizes (re-drawn after Ref. [3] from where the notation for s comes). 

digits by analytical methods However, MC results are only sensitive to 
the first few (not more than six) digits and, therefore, one finds no real 
advantages over using other physical quantities. 

14.3. Time series analysis 

Typically, one prefers in continuous systems time series data over keeping 
histograms, because one avoids then discretization errors jx]. Even in dis- 
crete systems time series data are of importance, as one often wants to 
measure more physical quantities than just the energy. Then RAM stor- 
age limitations may require to use a time series instead of histograms. To 
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illustrate this point, we use the Potts magnetization. 

In assignments a0501_08 and a0501_09 we create the same statistics on 
20 x 20 lattices as before, including time series measurements for the energy 
and for the Potts magnetization. For energy based observables the analysis 
of the histogram and the time series data give consistent results. 

For zero magnetic field, H — 0, the expectation value of the Potts 
magnetization on a finite lattice is is simply 

M g0 = (<W = ~ q , (133) 

independently of the temperature. For the multicanonical simulation it is 
quite obvious that even at low temperatures each Potts state is visited with 
probability 1/q. In contrast to this, the expectation value of the magncti- 
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Fig. 20. The Potts magnetization per lattice site squared for the q = 2 and q 
Potts models on a 20 X 20 lattice (assignments a0501_08 and a0501_09). 



10 



zation squared 



q0 = 9 




is a non-trivial quantity. At (3 — its value is M^ 



(134) 



= q (l/q) 2 = l/q, whereas 
and q = 10 Fig. shows 



it approaches 1 for N — > oo, /? — > oo. For q = 2 
our numerical results and we see that the crossover of M^ Q from 1/g to 1 
happens in the neighborhood of the critical temperature. A FSS analysis 
would reveal that a singularity develops at (3 C , which is in the derivative of 
Mg for the second order phase transitions (q < 4) and in M^ itself for the 
first order transitions (q > 5). 
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